Tolstoy Digital: Mining Biographical Data in Literary Heritage Editions

نویسندگان

  • Anastasia Bonch-Osmolovskaya
  • Matvey Kolbasov
چکیده

This paper presents a solution for mining the biographical information from commentaries on Leo Tolstoy’s letters. It is implemented as a part of Tolstoy Digital Project – a semantically marked-up web publication of the 90-volume complete collection of Leo Tolstoy’s works. Extraction of relevant biographical information will be used to create an open database for all the persons who were somehow connected with Tolstoy or Tolstoy’s works. The paper also accounts for various subtleties of the commentary apparatus and pays special attention to specific difficulties of biographical information extraction, such as the problem of defining the boundaries of expressions denoting profession, or the problem of non-standardized syntactic constructions for kinship relations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Algorithmic Discovery of Biographical Information in Local Gazetteers of Ancient China

Difangzhi (地方志) is a large collection of local gazetteers complied by local governments of China, and the documents provide invaluable information about the host locality. This paper reports the current status of using natural language processing and text mining methods to identify biographical information of government officers so that we can add the information into the China Biographical Dat...

متن کامل

Words in Contexts: Digital Editions of Literary Journals in the "AAC - Austrian Academy Corpus"

In this paper two highly innovative digital editions will be presented. For the creation and the implementation of these editions the latest developments within corpus research have been taken into account. The digital editions of the historical literary journals "Die Fackel" (published by Karl Kraus in Vienna from 1899 to 1936) and "Der Brenner" (published by Ludwig Ficker in Innsbruck from 19...

متن کامل

Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works

This article describes a joint research work between Monash University and the University of Alicante, where software originally meant for plagiarism and copy detection in academic works is successfully applied to perform comparative analysis of di erent editions of literary works. The experiments were performed with Spanish texts from the Miguel de Cervantes digital library. The results have p...

متن کامل

Digitisation of Literary Heritage Using Open Standards

The paper presents the methodology, technology and results of a collaborative Slovenian project aimed at e-publishing text-critical editions of literary heritage. The materials exhibit great complexity, as they are made available not only in facsimile but also in several interconnected transcriptions, and can include notes, glossaries, dictionaries, links to external resources, multimedia prese...

متن کامل

A.P. Chekhov’s Works in Interpretation by F.E. Paktovsky

This paper presents a conceptual analysis of the essay by Paktovsky (1901) which concentrates on the works by Chekhov. The urgency of the research is determined by the significance of the literary figure for the history of Russian criticism of the 19th – 20th centuries, the importance of his vision concerning the writing of the authors of Russian literature of the turn of the century, as well a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015